The Spoken language corpora for the nine official African languages of South Africa

نویسندگان

  • Jens Allwood
  • A P Hendrikse
چکیده

In this paper we give an outline of a corpus planning project which aims to develop linguistic resources for the nine official African languages of South Africa in the form of corpora, more specifically spoken language corpora. In the course of the article, we will address issues such as spoken language vs. written language, register vs. activity and normative vs. non-normative approaches to corpus planning. We then give an outline of the design of a spoken language corpus for the nine official African languages of South Africa. We consider issues such as representativity and sampling (urban-rural, dialects, gender, social class and activities), transcription standards and conventions as well as the problems emanating from widespread loans and code switching and other forms of language mix characteristic of spoken language. Finally, we summarise the status of the project at present and plans for the future.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Work on Spoken (Multimodal) Language Corpora in South Africa

This paper describes past, ongoing and planned work on the collection and transcription of spoken language samples for all the South African official languages and as part of this the training of researchers in corpus linguistic research skills. More specifically the work has involved (and still involves) establishing an international corpus linguistic network linked to a network hub at a UNISA...

متن کامل

Developing Text Resources for Ten South African Languages

The development of linguistic resources for use in natural language processing is of utmost importance for the continued growth of research and development in the field, especially for resource-scarce languages. In this paper we describe the process and challenges of simultaneously developing multiple linguistic resources for ten of the official languages of South Africa. The project focussed o...

متن کامل

The Effect of South African Geopolitical Position in the Development of Cinema in South Africa

This paper will discuss about the role of the geopolitical location of South Africa in the development of movie and the cinema industry in this country. Despite of bringing ci-nema to South Africa by white Europeans, but the development of this phenomenon is mostly due to the geopolitical position of this country. It is interesting to know that ci-nema reached Africa in much the same time as it...

متن کامل

Masithethe: speech and language development and difficulties in isiXhosa.

IsiXhosa is the second most spoken language in South Africa and one of its official languages. Spoken mainly in the Eastern and Western Cape regions it is fitting that much of the research focusing on children's isiXhosa speech and language acquisition has been carried out at the University of Cape Town (UCT). We describe what is known about children's acquisition of isiXhosa, and highlight stu...

متن کامل

Identifying phonological processing deficits in Northern Sotho-speaking children: The use of non-word repetition as a language assessment tool in the South African context

Diagnostic testing of speech/language skills in the African languages spoken in South Africa is a challenging task, as standardised language tests in the official languages of South Africa barely exist. Commercially available language tests are in English, and have been standardised in other parts of the world. Such tests are often translated into African languages, a practice that speech langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006